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Abstract 

In the query-commit problem we are given a graph where edges have distinct prob- 
abihties of existing. It is possible to query the edges of the graph, and if the queried 
edge exists then its endpoints are irrevocably matched. The goal is to find a querying 
strategy which maximizes the expected size of the matching obtained. This stochastic 
matching setup is motivated by applications in kidney exchanges and online dating. 

In this paper we address the query-commit problem from both theoretical and 
experimental perspectives. First, we show that a simple class of edges can be queried 
without compromising the optimality of the strategy. This property is then used to 
obtain in polynomial time an optimal querying strategy when the input graph is sparse. 
Next we turn our attentions to the kidney exchange application, focusing on instances 
modeled over real data from existing exchange programs. We prove that, as the number 
of nodes grows, almost every instance admits a strategy which matches almost all nodes. 
This result supports the intuition that more exchanges are possible on a larger pool 
of patient /donors and gives theoretical justification for unifying the existing exchange 
programs. Finally, we evaluate experimentally different querying strategies over kidney 
exchange instances. We show that even very simple heuristics perform fairly well, being 
within 1.5% of an optimal clairvoyant strategy, that knows in advance the edges in the 
graph. In such a time-sensitive application, this result motivates the use of committing 
strategies. 

1 Introduction 

The theory of matchings is among one of the most developed parts of graph theory and 
combinatorics [H]. Matchings can used in a variety of situations, ranging from allocation 
of workers to workplaces to exchange of kidney among living donors [22]. However, the 
uncertainty present in most applications is not captured by standard models. In order to 
address this limitation we consider a stochastic variant of matchings. 

Before presenting the query-commit problem we describe an application in kidney ex- 
changes which motivates our model. Unfortunately current patients who require a kidney 
transplant far outnumber the available organs. In the United States alone, more than 84,000 
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patients were waiting for a kidney in 2010 and 4,268 people in such situation died in 2008 
[27]. However, a distinguished characteristic pertaining to kidney transplants is that they 
can be carried using the organ of a living donor, usually a relative of the patient. Such 
operations have great potential to alleviate the long waiting times for transplants. 

One major issue is that many operations cannot be executed due to incompatibility 
between a patient and its donor. In order to overcome this, it is important to consider 2-way 
exchanges: Suppose patient A and its willing donor A' are not compatible and the same 
holds for B and B'; however, it is still possible that both patients can receive the required 
organ via transplants from A' to B and from B' to A. This situation can be modeled using a 
compatibility graph, where each node represents a patient/donor pair and edges represent the 
cross-compatibility between such pairs. The set of transplants that maximize the number of 
organs received is then given by a maximum matching in this graph [221 123] . 

However, such a model does not take into account uncertainty in the compatibility graph. 
In practice, preliminary tests such as blood-type and antigen screening are used to determine 
only the likelihood of cross-compatibility between pairs. Final compatibility can only be 
determined using a time-consuming test called crossmatching, which involves combining 
samples of the recipients' and donors' blood to check their reactivity. Furthermore, such a 
test must be performed close to the surgery date, since even the administration of certain 
drugs may affect compatibility That is, the transplant should be executed as soon as it 
is detected that two patient /donor pairs are determined to be cross-compatible. 

The kidney exchange application motivates the query-commit problem, which can be 
described briefly as follows. We are given a weighted graph G where the weight pe indicates 
the probability of existence of edge e. In each time step we can query an edge e of G and 
one of the following happen: with probability Pe (corresponding to the event that e actually 
exists) its endpoints are irrevocably matched and removed from the graph; with probability 
1 — Pe (corresponding to the event that e does not exist) e is removed from the graph. Notice 
that at the end of this procedure we obtain a matching in G, dependent on both the choices 
of the queries and the randomness of the edges' existence. In the query-commit problem our 
goal is to obtain a query strategy that maximizes the expected cardinality of the matching 
obtained. 

Our results. In this paper we address the query-commit problem from both theoretical 
and experimental perspectives. First, we show that a simple class of edges can be queried 
without compromising the optimality of the strategy. This result can be used to simplify the 
decision making process by reducing the search space. In order to illustrate this, we show 
that employing this property we can obtain in polynomial time an optimal querying strategy 
when the input graph is sparse. 

Then we turn our attentions to the kidney exchange application, more specifically on in- 
stances for the query-commit problem modeled over real data from existing kidney exchange 
programs. In this context we are able to prove the following result: as the number of nodes 
grows, almost every such graph admits a strategy which matches almost all nodes. This re- 
sult support the intuition that more exchanges are possible on a larger pool of patient / donors 
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[H [26]. More importantly, it shows the potential gains of merging current kidney exchange 
programs into a nationwide bank. 

Finally, we propose and evaluate experimentally different querying strategies, again fo- 
cusing on the kidney exchange application. We show that even very simple heuristics perform 
fairly well. Surprisingly, the best among these strategies are on average within 1.5% of an op- 
timal clairvoyant or n on- committing strategy that knows in advance the edges in the graph. 
This indicates that the committing constraint is not too stringent in this application. Thus, 
in such time-sensitive application, this result motivates the use of committing strategies. 

Related work. [6j recently introduced a generalization of the query-commit problem which 
contains the extra constraint that a strategy cannot query too many edges incident on the 
same node. In addition to kidney exchange, the authors also point out the usefulness of this 
model in the context of online dating. Their main result is that a simple greedy querying 
strategy is a within a factor of 1/4 from an optimal strategy, use a combination of 
two strategies in order to obtain an improved approximation factor of 1/3.88. ^ consider a 
further extension where edges have values and the goal is to find a strategy that maximize 
the expected value of the matching obtained. Using an LP-based approach they are able to 
obtain a strategy which is within a constant factor from an optimal strategy. 

We note, however, that in the case of the query-commit problem (i.e. when there is 
no constraint on the number of edges incident to a node that a strategy can query) every 
strategy which does not stop before querying all permissible edges is a 1/2-approximation [0]. 
This follows from two easy facts: (i) for every outcome of the randomness from the edges, 
such a strategy obtains a maximal matching and (ii) every maximal matching is within a 
factor of 1/2 from a maximum matching. 

The query-commit problem is similar in nature to other stochastic optimization problems 
with irrevocable decisions, such as stochastic knapsack [S] and stochastic packing integer 
programs [9] . In both [SI |9] the authors present approximation algorithms as well as bounds 
on the benefit of using an adaptive strategy versus a non-adaptive one. 

Different forms of incorporating uncertainty have been also studied [2^ and in particular 
stochastic versions of classical combinatorial problems have been considered in the literature 
fr2\ 120] • Matching also has its variants which handle uncertainty via a 2-stage model [14j 
or in an online fashion [51 dUl IH [131 [13 [TH]. The latter line of research has been largely 
motivated by the increasing importance of Internet advertisement. 

The kidney exchange problem, in its deterministic form, has received a great deal of 
attention in the past few years [H [211 1221 [231 126] . In the previous section we argued that 2- 
way exchanges can increase the number of organs transplanted, but of course larger chains of 
exchanges can offer even bigger improvements. Unfortunately, considering larger exchanges 
makes the problem of finding optimal transplant assignments much harder computationally 
even if all the edges are know in advance [1]. Nonetheless, [1] present integer programming 
based algorithms which are able to solve large instances of the problem, on scenarios with up 
to 10,000 patient/donor pairs. The authors point out, however, the importance of considering 
other models which take into account the uncertainty in the compatibility graph. Finally, 
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[21 |28l |29] address the dynamic aspect of exchange banks, where the pool of patients and 
donors evolve over time. 

The remainder of the paper is organized as follows. In Section [2] we present a more 
formal definition of the query-commit problem as well as multiple ways of seeing the process 
in which matchings are obtained from strategies. Then we prove the important structural 
property (Lemma [1]) which is used to obtain in polynomial time optimal strategies for sparse 
graphs (Section 13]). Moving to the kidney exchange application, we describe in Section H] a 
model for generating realistic compatibility graphs and prove that, as the number of nodes 
grows, most of these instances admit strategies which match almost all nodes. In Section [5] 
we address the issue of estimating the value of a strategy as well as computing upper bounds 
on the optimal solution, and conclude by presenting experimental evaluation of querying 
strategies. As a final remark, the proofs of all lemmas which are not presented in the text 
are available in the appendix. 

2 Preliminaries 

We use V{G) and E{G) to denote respectively the set of nodes and edges of a given undirected 
graph G, and use v{G) and e{G) to denote their cardinalities. In addition we use fi{G) to 
denote the cardinality of the maximum matching in G. We define the neighborhood of a 
node u as the set N{u) = {v E G : {u,v) E G} and the (edge) neighborhood of an edge 
{u, v) as the set N{{u, v)) = {{x,y) E G : {u, v} fl {x, y} ^ 0}. Notice an edge is included in 
its own neighborhood. We ignore isolated nodes in all subsequent graphs. 

When G is a rooted tree, root(G') denotes its root and Gu is the subtree of G which 
contains u and all of its descendants. When G is a binary tree, we use l{u) and r{u) to 
denote respectively the left and right children of a node u E G; we also call G;(„) and Gr(u) 
respectively the left subtree and right subtree of node u. The height of a rooted tree is the 
length (in number of nodes) of the longest path between the root and a leaf. 

Throughout the paper we will be interested in weighted graphs G = (y,E,p) where 
p : — 7- (0, 1] associates nonzero weights to the edges of G; we refer to them simply as 
weighted graphs. A scenario or realization a of G, denoted by a ~ G, is a subgraph of G 
obtained by including each edge e independently with probability pe- Note there can be up 
to 2l^l possible realizations in G. 

Now we describe, somewhat informally, the dynamics of querying strategies. Consider a 
weighted graph G, a scenario a ~ G and a querying strategy S. We start with an empty 
matching and S makes its first query for an edge e of G. If e G a then e is added to 
the current matching and we obtain the residual graph (i.e. the set of permissible edges) 
R = G \ N{e). If e ^ a then e is not added to the matching and the residual graph is 
R = G \ e. At this point S queries any other edge of G; usually we focus on the case that 
the new edge belongs to R, since edges outside R cannot be added to the matching. The 
process then continues in the same fashion. We remark that S is oblivious to the scenario a 
and only uses information from previous queries in order to decide its next query. 

In order to make this process more precise we use decision trees to represent querying 
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strategies. In our context, a decision tree T is a binary tree with the following properties: 
(i) each internal node x G T corresponds to a query for an edge a{x) G G; (ii) all nodes in 
the right subtree of x correspond to queries for edges in G \ N{a{x)); (iii) all nodes in the 
left subtree of x correspond to queries for edges in G \ a{x). It is also useful to associate 
to each node x E T the residual graph in the following recursive way: = G, 

Qi~ix) — Qx ^ ]^i^a{x)) and G'*^^^ = G^ \ a{x). Figured] presents an example of a decision tree. 

A decision tree T can be interpreted as a querying strategy as follows: First query the edge 
a(root(T)) G G associated to the root of T; if this query is successful then add a(root(T)) 
to the current matching and proceed querying using the right subtree of root(T), otherwise 
just proceed querying using the left subtree of root(T). Hence, the execution of T over a 
scenario a induces a path of nodes xi, X2, . . . , in T, where a(xi), a{x2), ■ ■ ■ , a{xk) is the 
sequence of edges queried by T and G^^, G^^, . . . , G^*" the sequence of residual graphs. 

Notice that, since T only queries permissible edges, the matching obtained is exactly 
the set of queried edges which belong to a; this matching is denoted by M{T,G){a). After 
unpacking previous definitions, we can also write the random matching M{T, G) recursively 
as follows (where p = root(T) and e = a(root(T)) to simplify the notation): M{T,G) = 
e U M(T,,(p), G \ A^(e)) with probability Pe and M(T, G) = M{Ti^p), G\e) with probability 
1 — Pe. Thus, the expected size of M{T, G) is given by 

EM(T, G) = p,{l + EM(T,,(,), G \ N{e)) + (1 - Pe)EM(T,(,), G \ e), (1) 

where we use EM(T, G) instead of E [|M(T, G)|] to simplify the notation. 

We remark that every strategy can be represented by a decision tree, so we use to 
denote a decision tree corresponding to a strategy S and use both terms interchangeably. 

Making use of the above definitions, we can formally state the query-commit problem: 
given a weighted graph G = (V, E,p) with p : E [0,1], we want to find a decision tree T 
for G that maximizes EM(T, G). The value of an optimal solution is denoted by OPT{G). 

We are interested in finding computationally efficient strategies for the query-commit 
problem. Since decision trees may be already exponentially larger then the input graph, our 
measure of complexity must allow implicitly defined strategies. We say that a strategy is 
polynomial-time computable if the time used to decide the query in each step is bounded by 
a polynomial on the description of the input graph; this includes any preprocessing time (i.e. 
time to construct a decision tree). In Section [3.11 we present structures similar to decision 
trees that are useful to describe time-efficient strategies. 

3 General theoretical results 

We say that an edge of a graph is pendant if at least one of its endpoints has degree 1. As 
a start for our theoretical results, we show that pendant edges can be queried first without 
compromising the optimality of the strategy. This observation will be fundamental for the 
development of the polynomial-time computable algorithm for sparse graphs and is also used 
in the heuristics tested in the experimental section. 
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Figure 1: (a) Input graph G with edges labeled from 1 to 4. (b) Example of decision tree 
T for G, with labels corresponding to a{x) for each internal node x of T. Let y denote the 
father of I3, so that a{y) = 4; G^ is the subgraph of G consisting of edges 3 and 4. The bold 
path in T is the path obtained by executing T over the scenario a which consists of edges 2, 
3 and 4. Moreover, M{T,G){a) = {2}. 

Lemma 1. Consider a weighted graph G and let e be a pendant edge in G. Then there is 
an optimal strategy whose first query is e. 

In order to illustrate the relevance of pendant edges we mention the following result. 

Lemma 2. Suppose G is a weighted forest and let S be a strategy that always queries a 
pendant edge in the residual graph. Then for every a ^ G, \M{S, G){a)\ = fi{cr). 

3.1 Optimal strategies for sparse graphs 

We say that a graph G is (i-sparse if e{G) < v{G) + d. In this section we exhibit a polynomial- 
time optimal strategy for (i-sparse graphs when d is constant. We focus on connected graphs 
but the result can be extended by considering separately the connected components of the 
graph. 

So let G be a connected (i-sparse graph and we further assume (for now) that G does not 
have any pendant edges; the rationale for the latter is that from Lemma [1] we can always 
start querying pendent edges until we reach a residual graph which has none. 

Contracted decision trees. First, we need to introduce the concept of a contracted de- 
cision tree (CDT), which generalizes the decision trees introduced in Section [2l 

Given a strategy 5* for a weighted graph H we use 7^(5*, H) to denote the set of possible 
residual graphs after the execution of 5*, that is, 7^(5*, H) = {H^ : x is a leaf of T^}. Then a 
contracted decision tree T for G is a rooted tree where every node x is associated to a residual 
graph G^ and every internal node x is associated to a query strategy for G^ satisfying 
the following: (i) G^°°^^'^^ = G; (ii) every internal node x of T has exactly q = |7^(S'^,G^)| 
children yi, y2, . . . ,yq and 71{S^, G^) = {G^^, G^^, . . . , G^''}; (iii) if x is an ancestor of |/ in T 
then 7^ 5"^. Figure H] presents an example of a contracted decision tree. 
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Figure 2: (a) Input graph G with edges labeled from 1 to 4. (b) A decision tree T' for the 
subgraph of G induced by edges 1 and 2. The set 7l(T',G) consists of the graphs Gi,G2 
and 6*3, where Gi is the subgraph of G induced by edges 3 and 4, G2 is the empty graph 
and G3 is induced by edge 3. (c) A CDT f for G where = T'. The graphs for 

the children of root(T) are, respectively from left to right, Gi, G2 and G3. The decision tree 
corresponding to T is T in Figure [T]b. Conversely, T can be obtained by contracting T' in 
T. 

A few remarks are in place. First, condition (iii) is not really fundamental in the def- 
inition, although it avoids trivial cases in future proofs. More importantly, notice that a 
decision tree is simply a CDT where strategy S*^ queries a single edge of G^. Also, a CDT 
can be seen as decision tree where some of its subtrees were contracted into a single node 
and, conversely, we can obtain a decision tree from a contracted decision by expanding the 
partial strategies S^'s into decision trees. 

A CDT T can be interpreted as a strategy in a similar way as in decision trees: start with 
an empty matching at the root p = root(T) and query according to S^, which gives a partic- 
ular residual graph R G 71{S'',Gp) depending on the current scenario a; add M{S'',G''){a) 
to the current matching and proceed querying using the subtree T^, where x is the child of 
p with G^ = R. The expected size of the matching obtained by a CDT T can be written in 
an recursive expression similar to ([1]): 

EM(T, G) = EMiS", G") + ^ Pr(G^) ■ EM{T^, G^), (2) 

x:x is a child of p 

where Pr(G^) is the probability with respect to the scenarios of G^ that the residual graph 
is G^ after employing S'' to G''. 

Decomposition of G and filtering of the strategy space. Now we turn again to the 
problem of finding an optimal strategy for G. Let V>3 be the set of nodes of G which have 
degree at least 3. Since the nodes in G \ V>3 have degree at most 2, all of its connected 
components are either paths of cycles; moreover, we claim that all of them are actually paths. 
By means of contradiction suppose a connected component of G \ V>3 is a cycle G. Since G 
is connected, it must contain an edge from a node m G G to a node in V>3. However, this 
implies that u has degree at least 3 in G and hence u G V>3, contradicting the fact G is a 
component of G \ \^>3. 
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In light of the previous claim it is useful to think about the structure of G is terms of 
V>3 and the paths Pi, . . . , that are the connected components of G\ V>3. Notice that the 
edges of G which have an endpoint in l/>3 are not present in this decomposition; however, 
there are at most Qd of these edges, which allows us to ignore them for most part of the 
discussion of the algorithm. To see this upper bound on the number of such edges first notice 
that \V>^\ < 2d; this holds because G has no pendant edges and hence all of its nodes have 
degree at least two, so summing over all degrees in G we get 

2e{G) > 3\V>3\ + 2{v{G) - llSsI) |^>3| < 2(e(G) - v{G)) = 2d. 

Then if a is the number of edges with some endpoint in V>3 we again add all degrees of G 
to obtain 

2e{G) >a + 2{v{G) - \V>^\) ^a< 2{e{G) - v{G)) + 2\V>3\ < 6d, 

obtaining the desired bound. 

This result also leads to a bound on the number of paths Pj's. To see this, consider a 
path Pi and let u and v be its endpoints. The assumption that G does not have pendant 
edges implies that there are two distinct edges with one endpoint in {u,v} and the other in 
V>3. Since there are at most 6d of these edges in G and since the Pj's are disjoint, we have 
that there are at most 3d paths Pj's. 

Exploiting the structure of G highlighted in the previous observations, we can construct 
in polynomial time a CDT which gives an optimal querying strategy. We briefly sketch the 
argument used to obtain this result. The main observation is that after querying an edge e 
in Pj we always obtain a residual graph where some edges in Pj are now pendant. Then we 
can use Lemma [T] to keep querying pendant edges, which leads to a residual graph that does 
not contain any edges of Pi. These observations imply that in order to obtain an optimal 
strategy for G we essentially only need to decide which edge to query first in each Pj, as 
well as an ordering among these edge and the edges with endpoint in V>3; all the rest of the 
strategy follows from querying pendant edges. Moreover, using the fact that there are at 
most 6d edges with endpoint in V>3 and at most 3d paths Pj's, we can enumerate all these 
possibilities in time poly{e{G)) and obtain the desired result. 

Now we formalize these ideas. Consider a subgraph H of G and consider one of the 
paths Pj's given by (^161^262 . . . CqUq+i). For an edge e = ej E H we define S{H, e) as the 
strategy which queries edges Cj, ej_i, . . . , ei in this order and then queries Cj+i, ej+2, . . . , in 
this order, always ignoring edges which do not belong to H. Essentially S{H, e) is querying 
e first and then edges in Pj which becomes pendant. Notice that there are actually two 
strategies satisfying the above properties, depending on the orientation of the path Pj; so we 
fix an arbitrary orientation for the paths Pj's in order to avoid ambiguities. 

It follows directly from the definition of S{H, e) that for any residual graph R G TZ{S{H, e), H) 
we have R fl E{Pi) = 0. More specifically, the set of residual graphs 7l{S{H,e), H) can 
only contain the following graphs: H \ {ui, . . . , Ug}, H \ {u2, . . . , Ug}, H \ {ui, . . . , 
and H \ {u2, . . . ,Uq-i}. For instance suppose nodes ui and Ug both belong to H; then 
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H \ {mi, . . . ,Uq^i} is the residual graph of the scenario a iff Ui is the endpoint of edge in 
M{S{H, e), H){a) and Ug is not. 

Now the next lemma makes formal the assertion that after querying the first edge of Pi 
we can just proceed by querying pendent edges. 

Lemma 3. Let H be a subgraph of G. Let e be an edge in E{H) fl E[Pi) and suppose that 
there is an optimal strategy for H that queries e first. Then there is an optimal CDT for H 
whose root is associated to the strategy S{H,e). 

Now let J-' be the family of all CDT's for G and its subgraphs with the following properties. 
A CDT T for a subgraph H of G belongs to J-" if: (i) T has height at most 9d + 1; (ii) each 
node X of T is either associated to a strategy which consists of querying one edge incident to 
V{H^)nV>3 or it is associated to a strategy S{H^, e) for some Pi and some e G E{H^)nE{Pi). 
The usefulness of this definition comes from the fact that we can focus only on this family 
of CDT's. 

Lemma 4. There is an optimal CDT for G which belongs to J-". 

This lemma is formally proved in the Online Supplement, but the structure of the proof 
is the following. First we apply Lemma [3] repeatedly to show the existence of an optimal 
CDT satisfying property (ii) of J-". To complete the proof, we make use of the fact that there 
are at most 6d edges incident to y>3 and at most 3d paths Pj's in G to we show that there 
is one such optimal CDT in J-" satisfying the height requirement (i). 

The main point in restricting to CDT's in J-" is that there is only a polynomial number 
of them, as stated in the next lemma. 

Lemma 5. The family J-" has at most e{G)^'^'^^^ CDT's. 

Computing the value of a tree in J-". In light of the previous section, we only need 
find the best among the (polynomially many) CDT's in J-" in order to obtain an optimal 
strategy for G. However, we still need to be able to efficiently calculate the expected size 
of the matching obtained by each such tree. In this section we show that this can be done 
recursively employing equation ([2]). 

Consider a CDT T G J-" and let x be a node in T. Assume that we have already calculated 
EM(Ty,G^) for all proper descendants y of x. To calculate EiM{Tx,G^) we consider two 
different cases depending of S^. 

Case 1: only queries an edge e incident to V>3. As mentioned previously, Tl{S^,G^) = 
{G^ \ N{e), G^ \ e}. Since Pe is the probability that e belongs to the realization of G^, we 
have that EM(S'^, G^) = Pe and the probability that G^ \ N{e) is the residual graph is also 
Pe. Therefore, if yi and y2 are the children of x associated respectively to the residual graphs 
G^ \ N{e) and G^ \ e, then equation ([2]) reduces to: 

EM{T^,G^) =Pe+Pe^M{Ty,,Gy^) + (1 - Pe)EM(T,„ G^^). 
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After inductively obtaining all the terms in right hand side of the previous expression, 
EMiTrc, G^) can be computed directly. 

Case 2: equals to S{G^,e). We proceed in the same way as in Case 1, calculating the 
terms of the right hand side of ([2]). However, these computations are not as straightforward 
as in the previous case. Given a random matching M, let u G M denote the event that node 
u is matched in M and let m ^ M be the complementary event. The following lemma is the 
main tool used during this section and can be obtained via dynamic programming. 

Lemma 6. Consider a path Pi and a subgraph H of G. Also consider an edge e G E{H) fl 
E{Pi) and define M = M{S{H,e),H). Then there is a procedure which runs in time poly- 
nomial in the size of G and computes the values EM, Pr[ui G M), Pr{uqj^i G M) and 
Pr{ui G M AUq+i G M). 

The previous lemma directly gives that EM(S'^', G^) can be computed in polynomial time, 
so we only need to compute the probabilities that R G 71{S^, G^) is the residual graph after 
employing to G^ . For that, let Pj = {uieiU2e2 ■ ■ ■ CgUg^i) be such that e G Pj. Recall that 
TZ{S^ , G^) can only contain graphs from the list: G^ \ . . . , Wg+i}, G^ \ {m2, . . . , Wg+i}, 
G^ \ . . . , Uq] and G^ \ {m2, . . . , Uq\. It is easy to see that we can write the probability of 
obtaining each residual graph in 71{S^, G^) using the probabilities in LemmaO For instance, 
the probability of obtaining G^ \ {ui, . . . , Ug} is exactly 

Pr (mi G M(^^ G^) A Ug+i i M{S\ G^)) 

= Pr (m G M{S\ G^)) - Pr {m G M(^^ G^) A Ug+i G M(5^ G^)) . 

Therefore, Lemma implies that there is an efficient algorithm to compute all terms in 
the right hand side of equation ([2]), which gives an efficient way to compute EM(S"^, G^) as 
in Case 1. 

Putting everything together. Consider a connected (i-sparse graph G, possibly con- 
taining pendant edges. We define a strategy S for G which proceeds in two steps. First, 
S queries pendant edges until none exists. At this point we have a residual graph G' with 
no pendant edges. In the second step it queries according to an optimal strategy S' for G'. 
Applying Lemma [1] repeatedly, and using the fact that 5" is optimal, we get that S is an 
optimal strategy for G. 

In order to prove that 5* is polynomial-time computable we only need to show that 5" is 
polynomial-time computable. To do so, we need the fact that every connected component 
of G' is (i-sparse, which follows from successive applications of the following easy lemma. 

Lemma 7. Let G he a connected d-sparse graph. Then for every edge e G E{G), the 
connected components of G\e and G \ N{e) are d-sparse. 

Let G[, . . . ,G'i^ be the connected component of G'. Since G[ is d-sparse, we can use the 
tools from previous sections to find an optimal strategies SI for the G'f we enumerate at 
most e(G9'^^''^^ CDT's for G- and calculate the value of each of them using the procedure 
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outlined in the previous subsection; then letting S'^ be the strategy among those which has 
largest value we get that S"- is optimal for G[ (cf. Lemma Hj). Then an optimal strategy S' for 
G' is obtained by querying according to strategy S[, then and so on. Notice that the total 
time needed to compute the S^s is bounded by Yl^=iP'^hi^iG'j)^'^'^^^), which is polynomial 
in e{G) for constant d. Thus, S' is polynomial-time computable and we obtain the desired 
result. 

Theorem 1. Let G be a connected weighted graph satisfying e{G) < v{G) + d for a fixed d. 
Then there is polynomial-time computable optimal strategy for G. 

4 Theoretical results for kidney exchanges 

Now we focus on the kidney exchange application for the query-commit problem. In this con- 
text, weighted graphs are interpreted as weighted compatibility graphs: each node represents 
a pair patient / donor and the weight of an edge represents the likelihood of cross-compatibility 
between its endpoints. Our main result in this section is to show that the majority of com- 
patibility graphs admits a simple querying strategy that matches essentially all of its nodes. 
In light of this result, a strong motivation for creating a large unified bank of patient and 
donors is obtained. 

4.1 Generating weighted compatibility graphs 

In order to make the previous claim formal we need to introduce a distribution of compat- 
ibility graphs. In the context of deterministic kidney exchanges, [25] introduced a process 
to randomly generate unweighted compatibility graphs. This process is modeled over data 
maintained by the United Network for Organ Sharing in order to produce realistic instances 
and several works have considered slight variations of it [H [211 [221 [23 [26] . Two physiological 
attributes are considered to determine the incompatibility of a patient and a donor. The first 
is their ABO blood type, where a patient is blood-type incompatible with a donor if their 
blood-type pair is one of the following: 0/A, 0/B, 0/AB, A/B, A/AB, B/A and B/AB. 
The second factor is tissue-type incompatibility and represented by PRA (percent reactive 
antibody) levels. We now briefly describe the process from [23] which generates unweighted 
graphs and then we mention the slight modification that we use to generate weighted graphs. 

A pair patient/donor is characterized by 5 quantities: the ABO blood type of the patient, 
the ABO blood type of the donor, the indication if the patient is the wife of the donor, the 
PRA level of the patient and the indication if the patient is compatible with the donor. A 
random pair patient /donor is obtained by assigning independently a value for the first four 
quantities and then picking the compatibihty depending on these values. The distribution 
of these values is described in detail in [25] and we only highlight one key property: 

Fact 1. For every pair of blood types i/j, the probability that a random pair patient/donor 
has blood type i/j and is incompatible is nonzero. 
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In order to generate an unweighted compatibility graph on n vertices we first sample 
random pairs patient /donor until obtaining n incompatible pairs; each pair is added as a 
vertex to the graph. Then for every pair u, v of nodes in the graph, the probability P(u,v) of 
cross-compatibility between them is defined based on their physiological characteristics. A 
key property is that for any two pairs patient/donor u and v the quantity P(u,v) is nonzero 
if and only if their blood types are cross-compatible (i.e. the patient of u is blood-type 
compatible with the donor of v and the patient of v is blood-type compatible with the donor 
of u). More specifically, there is a constant ci < 1 independent of n such that 

P{u,v) > ^ p,^u,v) > 1 - Ci. (3) 

To conclude the construction of the graph, for each pair of nodes u,v a coin is flipped 
independently and with probability P{u,v) an edge is added between u and v. 

Finally, the modification of the above procedure to generate weighted compatibility 
graphs consists of changing the last step: if P(u,v) > then the edge (m, v) is added to 
the graph and P{u,v) becomes its weight. 

4.2 An almost optimal strategy 

In this section we present a simple querying strategy that achieves EM(S', G) ~ v{G)/2 for 
almost all weighted compatibility graphs G generated by the above procedure. To obtain 
such strategy we decompose the graph into cliques and complete bipartite graphs based on 
blood-type compatibility. Then we obtain good strategies for these structured subgraphs 
and finally compose them into a strategy for the original graph. 

Let be the distribution of weighted compatibility graphs on n nodes generated by 
the procedure from the previous section. For a graph G in we use Vij{G) to denote the 
subset of vertices corresponding to the patient/donor pairs which have blood type i/ j. The 
lower case version Vij{G) is used to denote the cardinality of Vi,j{G). 

Consider a random graph G ~ and a node u of it. The first observation is that the 
probability that u has patient/donor blood type i/j is equal to 

Pr{P/D has blood type i/j \ P and D are incompatible), 

where P/D is a random pair patient/donor. Using the definition of conditional probability 
and Fact [1] we get that this probability is nonzero. Since we have finitely many blood types, 
this means that there is a constant C2 independent of i/j and n such that 

Pr{u has patient /donor blood type i/j) > C2 for all i,j. (4) 

This property, together with the symmetry of blood types of patients and donors, gives the 
following fact. 

Fact 2. The following properties hold for every i,j (where the expectation is taken with 
respect to the distribution V"'): 



12 



1. E [vij] = n(n) 

2. E [vi^j\ = E [t;,- ,] 

In order to describe and analyze the proposed querying strategy, we first focus on tlie 
set of graphs in which have a typical number of nodes associated to each pair of blood 
types. That is, for a > we consider which is defined as the set of all graphs G in 
which satisfy 

(l-«)EK,]<t;,,(G)<(l + a)EK,]. 

We first show how to obtain good strategies for graphs in and then argue that most of 
the graphs in are in this family. 

Fix a e (0, 1) and consider a graph G inQ^- Let Gij be the induced subgraph G[Vij{G) U 
Vj^i{G)]. Clearly these subgraphs partition the nodes of G. Moreover, every two nodes in Gij 
are blood-type cross-compatible, since a patient is blood-type compatible with a donor with 
the same blood type. Therefore, the construction of G (and more specifically the properties 
of p{u,v)) implies that Gij is a complete bipartite graph if z 7^ j and a complete graph if 
i = j. The fact that G & and part 2 of Fact [2] additionally give the following: if i 7^ j 
there is complete bipartite subgraph G'^^ of Gij with equally sized vertex classes and with 
t;(q^.)>2(l-«)EK,]. 

The motivation for partitioning G is that there are very simple strategies that work well 
in complete (bipartite) graphs, as shown in the next two lemmas. 

Lemma 8. Let G he a weighted complete bipartite graph with k vertices in each vertex class. 
Then for every e > there is a strategy S such that 'EM(S,G) > (1 — g'-'^'^J — e)k, where 
q = max{l — pe : e E E{G)}. 

The analogous lemma when G is a clique can be proved by applying the previous lemma 
to a complete bipartite subgraph of G with vertex classes containing \y{G)/2\ vertices. 

Lemma 9. Let G be a weighted complete graph with k vertices. Then for every e > there 
is a strategy S such that EM{S, G) > {1 - gL^(''-i)/2J - e) [k/2\ , where q = max{l - p^ : e e 
E{G)}. 

Let SI J be the strategy given by Lemma [H] for G^^- and Si be the strategy given by Lemma 
[9] for GiA- Since the graphs G-^'s and Gi/s are disjoint, we can apply the strategies S'^jS 
and Si's sequentially and obtain a strategy S for G such that 

EM(5, G) = ^Y1 EM(5:_^., G[^) + EM(5„ G,,), 

where the factor 1/2 in the right hand side appears because the graph Gij = Gj^i is counted 
twice. Defining q = max{l —pg-.e^ E{G)} and employing the bounds from Lemmas[8]and 
[HI we have that for every e > 

EM(S, G) > i (1 - ^"^^ - ^) ^ + E (1 - 
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Since G e ^^J, Fact Ogives that v{G^^i) = Vij{G) > (1 - a)E [vij] = fi(n) and v{G[j) > 
2(1 — a)E [vij] = Q{n). This gives the following asymptotic bound on the quality of strategy 
S (assuming n large enough): 



EM{S,G) >{l-q 



en{n) 



> (1 _ q^^in) _ 



{1-q 







2^ 4 



;i - a)E [v. 



(1 — a)n 



4 



v{Gi 



;i - a)E [vi 



> 1 — Ci — e 



fl — a)n 



where Ci is satisfies ([3]). Since Ci < 1 it follows that lim„_j.oo EM(S', G)/{n/2) = (1 — e)(l — «) 
and thus S matches almost all nodes of the typical graph G for sufficiently large n. 

Now we argue that most graphs in belong to the family Q'^. That is, we want to 
lower bound the probability that a random graph G ~ V"^ has values fjj((j')'s concentrated 
around the expectation. 

Consider a random graph G ~ P". Since the blood type of each node of G is chosen 
independently, Vij{G) is distributed according to a binomial process of n trials. There- 
fore, using Hoeffding's inequality [4] we get that the probability that Vij{G) lies outside 
the interval [(1 — a)E [f jj] , (1 + a)E [ujj]] is at most 2 exp(— 2n^(c2a)^), where C2 is the 
constant in (jlj). Since there are only 4 blood types, we can employ the union bound to 
estimating the probability that every Vij{G) is close to its expected value. With this we 
obtain that the probability that the generated compatibility graph belongs to is at least 
1 — 32 exp(— 2n^(c2a)^), which goes to 1 exponentially fast with respect to n. 

Combining the fact that there is a good strategy for graphs in with the fact that most 
graphs of belongs to Q"' gives the desired result. 

Theorem 2. For any e > there is an no{e) such that the following holds for every 
n > no{€). Consider a random compatibility graph G ~ P". Then with probability 1 — e 
over the distribution of G there is a polynomial-time computable strategy S which achieves 
EM{S,G) > (1 - e)n/2. 



5 Computational results 

In this section we present an experimental evaluation of the performance of simple querying 
strategies. During our tests, we decided to focus on the application of the query-commit 
problem to kidney exchanges and therefore all weighted graphs used in the tests were gen- 
erated randomly according to the procedure described in Section 14. 1[ The results show 
that practical heuristics perform surprisingly well, even when compared to optimal non- 
committting strategies. As a preparation to our experimental results, we first address the 
issue of estimating the value of a strategy and estimating an upper bound on OPT. 
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5.1 Estimating the value of a strategy 



Section 13.11 already indicated that it is not a trivial task to calculate this value exactly. 
Given a weighted graph G and a strategy 5*, the direct way of calculating EM (5*, G) involves 
finding M{S,G){a) for every scenario a ~ C However, there is often an exponential (in 
e{G)) number of such scenarios. When S is given as a decision tree we can compute 
EM(S', G) using the recurrence ([T]); but again this procedure takes time proportional to the 
number of nodes in T"^, which can be exponential in e{G) and is therefore also impractical. 

Despite previous studies on the query-commit and related problems [HI El [9], there is 
currently no efficient algorithm to compute the expected value of a strategy and most results 
rely on an estimation of this value by sampling a subset of the scenarios. Our goal in this 
section is to address how well we can estimate EM(S', G) by sampling from G and establish 
the accuracy as function of the number of samples. 

Given a weighted graph G ow n nodes and a strategy 5, the natural way to obtain an 
unbiased estimation of EM(5', G) is to sample independently k scenarios cti, . . . , 0"^ ~ G and 
take ri = (1/A;) ^^^^ \M{S,G){ai)\ as the estimate. Clearly E[r]] = EM{S,G). Moreover, 
since \M{S,G){ai)\ e [0,n/2] for all i we also have that rj G [0,n/2]. In order to show that 
r] is concentrated around the expectation we can simply employ Hoeffding's inequahty to 
obtain that for every t > 



According to this expression, we need approximately 0.4611n^/t^ samples in order to obtain 
a 95% confidence interval equal to EM(S', G) ±t. 

Notice that the previous bound does not use much of the structure of the matchings. In 
particular, (jS]) relies solely on the fact that the size of the matchings lie in [0, n/2]. However, 
this simple concentration estimate is essentially best possible. This holds because we can 
construct a graph and a strategy 5* for it which obtains a small matching half of the time 
and a large matching half of the time. The variance on the size of the matching obtained 
by 5* is then essentially as large as possible when compared to any random variable taking 
values in [0,n/2]; in such case, Hoeffding's inequality is rather tight. More formally, we have 
that following lemma. 

Lemma 10. There is a graph G on n nodes and a strategy S for querying G such that 



Clearly the size of a maximum cardinality matching in G is an upper bound on OPT{G), 
since the maximum matching in any realization of G has size at most fi{G). However, this 





for all t < n/lQ. 



5.2 Upper bound on OPT 
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bound can be made arbitrarily weak by considering small edge weights, e.g. G is a set of 
disjoint edges, each with weight p, so /i(G') = e{G) but OPT{G) = p ■ e{G). 

Notice that actually OPT can be upper bounded by the expected size of the maximum 
matching over the realizations of G; that is, for a ~ G we have OPT{G) < E = E [/x]. 

This bound is tighter than /i(G) and, as Section 15.31 supports, is oftentimes very close to 
OPT{G). An important remark is that E [fi] is a valid upper even for the non-commit or 
clairvoyant version of the problem, where the strategy can first find out exactly the edges 
in the realization and then decide which matching to take. Again we encounter the issue of 
calculating or at least estimating E [/i]. 

Estimating E [/i]. Clearly the sample average estimator used in the previous section can 
also be used to estimate E [/i] and the Hoeffding-based bound still holds. However, we can 
get substantially better concentration results by bounding the variance of the estimate more 
carefully. We make use of this tighter concentration to reduce the computational effort to 
estimate the upper bound on OPT. 

Again consider a weighted graph G on n nodes and m edges and consider k independent 
scenarios ai,...,ak ~ G. We take r] = (l/k) Yli^i^{cri) as the estimation of E [/i], since 
clearly E [r^] = E [fi\. Our goal now is to bound the variance of cTj, which will then be used 
to provide tighter concentration results for 77. For that we need to introduce the concept of 
self-bounding functions. 

A nonnegative function g : X'^ — t- M is self-bounding if there exist functions gi : X'^^^ — )• M 
such that for all Xi, . . . , G A" the following hold: 

< g{xi, ...,Xd)- gi{xi, Xi-i, Xi+i, . . . , x^) < 1 for alH = 1, . . . , 

and 

d 

^ [g{xi, . . . , Xd) - gi{xi, . . . , Xi_i, x^+i, . . . , x^)] < g{xi, . . . , x^) 
1=1 

The following lemma motivates the definition of self-bounding functions. 

Lemma 11 ([4j). Suppose g : X'^ M. is a measurable self-bounding function. Let Xi, . . . ,Xd 
be independent random variables with support on X and let Z = g{Xi, . . . ,Xd). Then 

Var[Z] <E[Z]. 

The connection between the previous lemma and our goal of estimating Var[yu((Tj)] comes 
from the fact that a^, can be seen as e(G) independent indicator random variables for the 
edges of G and ^{(Ti) can be see as a self-bounding function. To make this precise let 
61,62, ... ,em be the edges of G and let Xi, X2, . . . , Xm be independent Bernoulli random 
variables with Pr{Xi = 1) = Also, for an indicator vector x G {0, 1}™ of the edges of 
G, let fJ^'{x) be the size of the maximum matching in the subgraph of G induced by x (i.e. 
which contains the edge Cj iff Xj = 1). It is easy to see that X2, . . . , X^) = yu(cri) and 

the next lemma asserts that //'(.) is self-bounding. 
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Lemma 12. The function is self-bounding. 



Since /i(crj) = fi'{Xi, . . . , Xm), Lemma [TT] gives the desired bound Var[/i(o"j)] < E < 
n/2. Now that we have a handle on this variance, we can evoke Bernstein's inequahty jl] to 
bound the concentration of 77 = (l/k) Yli=i f^i'^d follows: 



When compared to ([5]), the lack of an extra term l/ra in the exponent makes ([6]) a much 
stronger bound. 

5.3 Comparison of heuristics 

In this section we present experimental results comparing simple querying strategies over 
random weighted compatibility graphs. All strategies use to some extent an optimization 
based on Lemma [H that is, they query pendant edges first if one exists. The heuristics 
considered in the experiments are described next. 

Maximum probability. This strategy first queries pendant edges in the residual graph. 
In case none exists, it queries the edge with highest weight. 

Minimum probability. Similar to the previous strategy but edges are queried by decreas- 
ing weights. 

Minimum degree. This strategy first queries pendant edges in the residual graph. In case 
none exists, it queries the edge which has minimum degree in the residual graph, where 
the degree of an edge is defined as the sum of the degrees of its endpoints. 

Minimum average degree. The average degree of an edge {u,v) is defined as the sum of 
the weights of edges incident to u plus the sum of the weights of edges incident to v. 
This strategy first queries pendant edges in the residual graph. In case none exists, it 
queries the edge which has minimum average degree in the residual graph. 

Batch successive matching. First, this strategy queries pendant edges in the residual 
graph. After no more pendant edges exist, it finds a maximum cardinality matching 
in the residual graph. Then it queries all the edges in this matching in an arbitrary 
order. After all these edges are queried the process is repeated. 

Batch successive weighted matching. Similar to the previous strategy, but now in each 
round it computes a maximum weighted matching with edge weights 1 — p. 

Successive weighted matching q. First, this strategy queries pendant edges in the resid- 
ual graph. After no more pendant edges exist, it finds a maximum weighted matching 
(with edge weights 1 — p) in the residual graph. Then it queries one arbitrary edge in 
this matching. The process is then repeated. 
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maxP 


minP 


minDeg 


ininAvgDeg 


batchSM 


batchWSM 


SWMq 


SWMp 


EN 


24.91 


26.52 


26.83 


28.21 


25.81 


26.92 


27.68 


26.65 


28.71 


22.43 


23.25 


24.07 


24.99 


22.69 


23.69 


24.41 


23.93 


25.51 


18.64 


19.03 


19.11 


19.53 


18.87 


19.24 


19.36 


19.17 


19.82 


16.36 


15.57 


16.62 


16.79 


16.49 


16.56 


16.67 


16.75 


16.85 


19.51 


21.36 


20.69 


22.22 


20.06 


21.06 


21.86 


20.17 


22.56 


22.74 


23.20 


23.53 


25.14 


23.20 


24.00 


24.77 


24.02 


25.64 


24.16 


24.38 


25.02 


26.44 


24.54 


25.33 


26.08 


25.28 


26.82 


21.36 


22.07 


23.18 


24.30 


22.41 


23.15 


23.90 


23.68 


24.66 


25.39 


26.06 


27.20 


27.98 


26.11 


26.79 


27.47 


26.63 


28.43 


22.87 


21.56 


23.61 


24.06 


23.06 


23.47 


23.81 


23.60 


24.43 


21.83 


22.30 


22.98 


23.97 


22.32 


23.02 


23.60 


22.99 


24.34 



Table 1: Comparison of different strategies. Each row corresponds to an instance, except 
the last one which reports the average value of each the strategy over all instances. 

Successive weighted matching p. Similar to the previous strategy, but the edge weights 
used in the maximum weighted matchings are simply p. 

The instances used in the experiments consist of random weighted compatibility graphs 
on 100 nodes, generated as described in Section 14.11 Simulating exchange pools with 100 
patient/donor pairs is optimistic but not unrealistic, as pointed out in |2T]. We also carried 
experiments in graphs with fewer than 100 nodes, but these graphs did not seem to be large 
enough to discriminate the querying strategies. 

In order to estimate the value of each strategy we employed the sample average estimation 
discussed in Section 15.11 For each execution, we used 38,000 samples in order to obtain a 
good estimate; according to inequality ([5]), with 0.95 probabihty, the estimate is within ±0.35 
of the actual expected matching obtained by the strategy. In order to obtain an estimated 
upper bound on OPT we used the sample average of E [fi] as described in Section 15.21 The 
number of samples was chosen to obtain an estimate which is within ±0.1 of E [fi] with 
probability 0.95. 

The results of the experiments are presented in Table [TJ The first eight columns corre- 
spond to the strategies in the same order as they were described and the last column presents 
the upper bound E [/i] on OPT. In each row, except the last one, we present the estimated 
value of the strategies on a given instance. The last row of the table indicates the average 
value of each strategy over all instances. 

Table [1] shows that all these simple strategies perform very well. Surprisingly, these 
heuristics are actually close to optimal clairvoyant strategies, since E [fj] is a valid upper 
bound for the non-commit version of this problem as well. These results indicate that, for 
the kidney exchange application, the commit requirement in the formulation of the problem 
is not too restrictive, in that good solutions are still obtainable under this constraint. Notice 
that strategy minAvgDeg in particular outperforms all others in every instance of the test 
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Figure 3: Sample average estimates of the value of minAvgDeg on the first instance of the 
test set as a function of the number of samples. The horizontal axis indicates the number of 
samples divided by 100. 

set. Moreover, its value stays always relatively close to the upper bound E[/x], being on 
average within 1.5%. 

A caveat to these results is that the confidence interval of ±0.35 for the value of the 
strategies does not allow complete discrimination among the best performing heuristics. 
The confidence intervals obtained from due to its generality, seems to be much looser 
than the actual bounds on the estimates. This is reinforced by Figure [31 which displays the 
sample average as a function of the number of samples; notice that with 2,500 samples the 
estimate already starts oscillating closely around the reported value of 28.21. We remark 
that a similar convergence profile holds for the other strategies tested. 

6 Conclusions and future work 

In this paper we considered the query-commit problem, a model for matchings which incor- 
porates uncertainty on the edges of the graph in a way that is suitable for time-sensitive 
applications. By using the fact that some edges of the graph can be queried without com- 
promising optimality, we show how to obtain an optimal querying strategy for sparse graphs 
in polynomial time. However, the dependency on the sparsity of the graph is doubly expo- 
nential. An interesting open question is to improve this running time, which may indirectly 
reveal other important properties of optimal strategies. On a similar note, another open 
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question is to prove lower bounds on the computational complexity needed for finding opti- 
mal strategies in general graphs. 

In Section we evaluated querying strategies over instances of the kidney exchange appli- 
cation and showed that even simple heuristics perform surprisingly well, even when compared 
to non-committing strategies. An important open question is designing a procedure which is 
able to provide even better strategies, possibly by starting with a heuristic and successively 
improving it (e.g. in a local search fashion). However, two hindrances for such procedures are 
the lack of an algorithm to compute the value of a strategy and the difficulty in representing 
strategies. As noted previously, the size of a decision tree may be exponential in the size of 
the input graph. 

Another possibility is to extend the current model to even more realistic setups. For 
instance, one could consider correlated uncertainty on the edges. In the context of kidney 
exchanges, the uncertainty on the PRA level of a node introduces correlated uncertainty on 
all edges incident to it. In addition, recent works in deterministic kidney matchings have 
considered not only 2- way exchanges but also longer chains of exchanges lU |2T], yielding 
additional transplants. A direction for future research is to study a suitable modification of 
the query-commit problem which can model uncertainty on longer exchanges. 

Finally, Theorem |5] indicates the potential of large kidney exchange programs. It would 
be of great value to obtain a more precise assessment of this potential and to address the 
logistic problems associated to nationwide transplant programs. 

Acknowledgments. We thank Tuomas Sandholm, Willem-Jan van Hoeve and David 
Abraham for helpful discussions. 
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A Proofs for Section 3 
A.l Proof of Lemma 1 

Let S* be an optimal strategy for G and consider the strategy S which first queries e and then 

proceeds exactly as S* . We clarify what happens in the following situation: the realization 
cr contains both e and another edge e' incident to e, and M{S* , G){a) contains e'. Clearly S 
adds e to the matching in the first step and cannot add e' while simulating S*; in this case, 
S still probes e' whenever S* does so and adapts accordingly, although e' is not added to 
the matching. 

We claim that S is optimal, which proves the lemma. The definition of S leads to 
the following observations. If e does not belong to the realization a then M{S,G){a) = 
M{S* ,G){a). On the other hand, if e belongs to a then: (i) the kth edge queried by S is 
exactly the {k — l)th edge queried by S* (the minus 1 comes from the fact that S queries 
e before simulating S*) and (ii) every edge added by S* to the matching M{S*,G){a) is 
also added to the matching M{S,G){a), the only possible exception being a single edge e' 
of M{S*,G){(j) incident to e; since e is pendant, M{S* ,G){(t) contains at most one edge 
incident to e. In the worst case M{S, G){a) contains the edges in M{S*, G){a) U {e} \ {e'} 
and we stiU have \M{S,G){a)\ > \M{S*,G){a)\. 

Together, these observations imply that EM(5', G) > 'EM{S*, G) and hence S is optimal. 
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A. 2 Proof of Lemma 2 



Let xi,X2, . . . ,Xkhe the path of induced by its execution over the scenario a. By definition 
of S, a{xi) is pendant in the graph 

By means of contradiction suppose that M(S,G){a) is not a maximum matching in a, 
namely there is a matching M* in a such that \M{S,G){a)\ < \M*\. Then from Berge's 
Lemma [TB] there must be an augmenting path P in a with respect to M{S, G){cr) and M*. 

Let a{xi) be an edge of P queried by . As mentioned in Section 2, M(T^ ,G){a) = 
{a{xi),a{x2), ■ ■ ■ ,a{xk)} H a, and since a{xi) G P C a we have that a{xi) belongs to 
M{T^ ,G){cr). Then, since P is augmenting, there are two edges e, e' G P which are in- 
cident to a{xi) and belong to M*. But since a(xj) is a pendant edge of G^\ it must be that 
either e or e' does not belong to G^' and without loss of generality we assume the former. 
The construction of G^^ implies that there is an edge a{xj) with j < i such that a{xj) G a 
and a{xj) is incident to e. 

Since a{xj) G cr, we know that a{xj) also belongs to M{T^ ,G){a). Since M(T'^,G)(cr) is 
a matching, it follows that a{xj) is not incident to an internal node of P and hence it must 
be incident to the endpoint of e which is also an endpoint of P. However, this contradicts 
the fact that P is an augmenting path, which completes the proof of the lemma. 

A. 3 Proof of Lemma 3 

For each R G TZ{S{H, e), H) let be an optimal decision tree for R. Consider the natural 
CDT r for H which has = S{H, e) and the children of root(T') are the trees T^'s. 

We claim that T' is optimal for H. To argue that, let T be the decision tree corresponding 
to T'. Using the correspondence between CDT's and decision trees, it suffices to prove that 
T is an optimal decision tree for H. 

We prove that is an optimal decision tree for H^, for every node x G T \ |J^T^; this 
is done by reverse induction on the depth of x in T. The fact that the trees are optimal 
removes the necessity of a separate base case, so consider a node x G T \ and assume 

that Tr[x) is optimal for H'^^^^ and Tk^x) is optimal for H^^^\ By the definition of S we have 
that a{x) is pendant in H^. Therefore, Lemma lasserts that there is an optimal decision 
tree for whose root queries edge a{x). Then it is easy to see that the optimality of Tr^^) 
and T;(2.) implies that actually is one such optimal decision tree for H^, which concludes 
the inductive step and the proof of the lemma. 

A. 4 Proof of Lemma 4 

First let us relax the definition of J-" by defining J-""*" as follows. A CDT T for a subgraph H 
of G belongs to J-""*" if each node x of T is either associated to a strategy which consists of 
querying one edge incident to V{H^) fl V>3 or it is associated to a strategy S{H^, e) for some 
Pi and some e G E{H^) fl E{Pi). Notice J-" is the set of CDT's in J-""*" which have height at 
most 9d + 1. 

Claim 1. There is an optimal CDT for G which belongs to . 
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Proof. We prove that for each subgraph H of G there is an optimal CDT for H in J^"*". We 
proceed by induction on the number of edges of the subgraph, with trivial base case for 
subgraphs with no edges. 

Consider a subgraph if of G with at least one edge and let T* be an optimal querying 
strategy for it. Suppose that the first edge e queried by T* is incident to V>3. For each 
R e n{S''°°^'^'^*\ H) let e be an optimal CDT for R, the existence of which is given 
by the inductive hypothesis. Then define the decision tree T for H as follows: its root 
queries edge e and the subtrees of root(T) are the trees {T^}. Using the recursive equation 
for EM(T, H) (equation (2) in the main paper) it is easy to see that the optimality of the 
trees {T^} implies that EM{T,H) > EM(r*,if) and hence T is optimal. Moreover, T 
clearly belongs to T'^, which concludes the inductive step in this case. 

Now suppose that e is not incident to V>3; this implies that e belongs to a path Pi. 
Let T* be an optimal CDT for H whose root is associated to S{H,e), whose existence is 
guaranteed by Lemma 3. Again, for each R G 7l{S{H,e),H) let e J-~^ be an optimal 
CDT for R. Now we construct the CDT T from T* by replacing the subtrees of root(T*) by 
the trees {T^}- As in the previous case, the optimahty of the trees {T^} imphes that T is 
optimal and also we have that T e J^^. This concludes the inductive step and the proof of 
the lemma. □ 

Proof of Lemma 4- Let T G J-""*" be an optimal CDT for G with the minimum number of 
nodes. We claim that T has height at most 9d + 1. 

By means of contradiction suppose not and consider a path Q from root(T) to one of 
its leaves which has more than 9d internal nodes. Since there are at most 6d edges incident 
to F>3 and at most 3d paths P^'s in G, this means that either: (i) two nodes in Q query 
the same edge incident to V>3 or (ii) two nodes x, x' in Q are associated to two strategies 
S{G^, e) and S{G^ , e'), where both e and e' belong to the same path Pj. Case (i) is forbidden 
by the definition of a CDT, so we consider case (ii). 

Without loss of generality assume that x is closer to the root of T than x'. Notice 
that by the definition of 5(G^',e') we have that e' G £;(G'^') n E{Pi). Now we use the 
fact that S{G^\e) removes all edges in Pj, that is, for every R G 71{S {G^ , e) , G^) we have 
E{R) r\E{Pi) = 0. But the fact that x' is a descendant of a; implies that G^ is a subgraph of 
a residual graph in TZ{S{G^ ^ e), G^) and hence E{G^ ) fl E{Pi). This contradicts a previous 
observation that e' G E{G^ ) H E[Pj), which implies that T has height at most 9d + 1 and 
concludes the proof of the lemma. □ 

A. 5 Proof of Lemma 5 

Consider a tree T G J-"; we claim that each node in T has at most 4 children. Equivalently, 
if X is a node in T we want to upper bound \TZ{S^,G^)\. If x is associated to a strategy 
SiG^^e) then as noted previously we have that \1Z{S^,G^)\ < 4. Now if x is associated to 
a strategy that only queries one edge e incident to ISs then, as in standard decision trees, 
71{S^, G^) = {G^ \ N{e), G^ \e}. It follows that the outdegree of each tree in is at most 
4. 
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Since a tree in T has height at most 9d+ 1, the previous degree bound imply that each 
such tree has at most 4^*^+^ nodes. Now notice that each node has one of e{G) possible 
strategies, because the strategies S'^'s allowed in T are uniquely determined by the choice of 
an edge. These observations imply that there are at most e[G)^ ^ trees in T . 



A. 6 Proof of Lemma 6 



Let Pi = (uieiU2e2 ■ ■ ■ <^qUq+i) and let S*^ denote the strategy which queries edges {efc}^_^nif 
in order of the indices from a to h. That is, if a < 6 then queries the edges Cq, Ca+i, . . . , e;, 



and if a > 6 then it queries edges 60,6^ 



-1) 



, 66, always ignoring edges outside H. To 



simplify the notation define the random matching = M[S\, H) and without ambiguity 
we use to also denote the random variable |M^| corresponding to its cardinality. 

First we prove that we can compute in polynomial time the following quantities corre- 
sponding to Si, for all a > 1: E [Ml\ei G Ml], E [Ml\ei ^ Ml] and Pr(ei G Ml)- then we 
show how to use this information to compute the values required by the lemma. 

We proceed in a dynamic programming fashion. First, it is trivial to compute the quan- 
tities associated to S\ and now we want to compute the quantities associated Si assuming 
that those for the 'S'^s with a' < a have already been computed. It is not difficult to see 
that 



E 



Mi 



ei 



eMl 



^Pea ( 



1 + E 



MI2 



ei e Ml,]) + (1 -pej E [Ml_,\e, e Ml, 



Moreover, the analogous expression with complementary conditionings also holds: 



E 



M} 



ei^Ml] ( 



1 + E 



MI2 



ei ^ Ml,]) + (1 -pj E [Ml,\er i M^ 



Finally, we also have Pr(ei G Ml) = Pe^Pr{ei G Ml,) + {l-pejPr{ei G Ml,). All these 
expressions can be easily computed using the information about Si, and Si, available by 
the dynamic programming hypothesis, concluding the proof of our claim. 

By suitably relabeling the edges e^, the above result implies that we can also compute 
E [M^\eq G M9], E [M^\eg ^ M«] and Pr{eq G M^) for a > 1. A final remark is that using 
the law of total expectation we can also compute E [Ml] and E [M^]. 

Now let a be such that e = e^. We show how to compute the desired quantities by the 
lemma: EM, Pr{ui G M), Pr{uq+i G M) and Pr{ui G M AUq+i G M). It is useful to think 
of S{H, Co) roughly as the strategy which queries Ca first then queries according to Si, and 
the according to Si, for suitable a' and a". To make this more formal we need to spht into 
a few cases. 



Case 1: 1 < a < q. Notice that since Ca E H and Ca is the first edge queried by S{H, e), 
the event G M is the same as the event that the edge exists (both which happen with 
probability Pea)- In addition, since by hypothesis Pj is a path with more than one node, we 
have Ui ^ Uq^i and thus no edge in the set {cfe}^"} intersects an edge in the set {ek}k=a+i- 
This guarantees that the outcomes of, say, strategies Si, and Si, are independent. 
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These observations give the following equations: 



E 



M 



E 



M 



e. e m] = 1 + E [Ml,] + E [Ml,] 
e. ^ mI = E [Ml,] + E [Ml,] . 



Also using the fact that 1 < a < q, and hence is not incident to either ui or Ug+i, we 
obtain that: 

Pr{ui e M\ea e M) = Pr{ui e Ml,) = Pr(ei e Ml,) 

Pr(iii e M\ea ^ M) = Pr(wi e M„^_i) = Pr(ei e M„^_i) 
Pr(iig+i e M|e„ ^ M) = Pr^+i e M^) = Pr(eg e M„%i). 

Using the additional independence remark made previously, we also have that 

Pr{ui e M A Ug+i G M\ea G M) = Pr(ei G M„^_2)^^(e« G 
Pr(Ki G M A G M|e„ ^ M) = Pr(ei G Ml,)Pr{eg G M^+i). 

Therefore, using the fact that Pr{ea G M) = p^^ and the laws of total probability 
and total expectations, we can compute EM, Pr(iii G M), Pr{uq+i G M) and Pr{ui G 
M A Uq+i G M) in polynomial time using the information about the S^^s. 

Case 2: a = 1 or a = g. The equations for the expectations are the same as in Subcase 
1.1. Now if a = 1 7^ g then 

Pr(ui G M\ea G M) = 1 
Pr{ug+i G M|e„ G M) = Pr(Mi G M A Ug+i G M|e„ G M) = Pr(eg G Ml,) 
Pr{ui G M|ea ^ M) = Pr{ui G M A m^+i G M\ea ^ M) = 
PrK+i G M|ea ^ M) = Pr(e, G M„%i). 

Applying a similar reasoning to the case a = g 7^ 1 we obtain 

Pr{ui G M|ea G M) = Pr{ui G M A Wg+i G M|e„ G M) = Pr(ei G M„^_2) 

PrK+i G Mle, G M) = 1 
Pr(Mi G M\ea ^ M) = Pr(ei G M^^.^) 
Pr{uq+i G M|ea ^ M) = Pr{ui G M A w^+i G M|ea ^ M) = 0. 

Finally, if a = 1 = then 

Pr{ui G M) = Pr{ug+i G M) = Pr(Mi G M A -u^+i G M) = pe„- 



26 



Therefore, we can again compute EM, Pr{ui e M), Pr{uq+i e M) and Pr{ui e M AUq+i e 
M) in polynomial time using the information about the 

Since we can calculate the probabilities associated to the S^'s in polynomial time and 
then according to Cases 1 and 2 use this information to calculate directly the values EM, 
Pr{ui e M), Pr{ug+i e M) and Pr{ui e M A Ug+i e M), this concludes the proof of the 
lemma. 

A. 7 Proof of Lemma 7 

First consider G \ e and let Gi and G2 be its connected components (if G \ e has only one 
component then we set G2 to be the empty graph). Since v{G) = v{Gi) +f (G2) and e{G) = 
1 + e(Gi) + 6(^2), we have that e{G) - v{G) = 1 + (e(G'i) - ^(^1)) + (6(^2) - ^(^2)). Since 
each Gi is connected, e{Gi) > v{Gi) — 1 for all i and thus d > e{G) — v{G) — e{Gj) — v{Gj) 
for all j. This shows that all components of G \ e are d-sparse. 

Now consider G \ N{e) and let Gi, . . . ,Gk be its connected components. Since G is 
connected it must contain a distinct edge connecting e to each G^. Thus, e{G) = 1 + 
^*Lj^(e(Gj) + 1). Using the fact that v{G) = 2 + Yli=i obtain 

k 

d > e{G) - v{G) = -1 + 5^(e(G,) - v{G^) + 1). 

i=l 

Again using the fact that each Gj is connected, we get e(Gi) — v{Gi) > —1 for all i and 
hence d> —1 + {e{Gj) — v{Gj) + 1) for all j, which shows that all components of G \ A'"(e) 
are d-sparse. 



B Proofs for Section 4 
B.l Proof of Lemma 8 

Let U = {ui, . . . ,Uk} and V be the vertex classes of G and for any node u G G let E{u) 
denote the set of edges incident to u. Consider the strategy S that queries all edges in E{uk) 
(in an arbitrary order), then queries all edges in E{uk~i) and so on. 

We want to upper bound the probability that a node Ui is unmatched in M(5', G). In 
order to achieve this, consider the execution of S right before it starts querying edges in E{ui) 
and let M denote the random matching of G obtained by 5* at this point. Suppose Ui is 
unmatched in M{S,G), which implies that no edge {ui,v) could be added to the matching. 
If {ui,v) could not be added to the matching then either {ui,v) does not belong to the 
realization of G or w is already matched in M. Thus, conditioning on the fact that all nodes 
in V C V{G) are unmatched in M we get that 

Pr lui i M{S,G)\ /\ v' i M J < Pr ( /\ {u^^v) ^ g| I\ v' i m\ . (7) 
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Since M does not depend on the existence of any edge (wj, f ), we can use the independence 
of the edges in G to obtain 

Pr i /\ iu,,v) /\ v' ^m] =Pr( /\ {u„v) ^ G ) = J] Pr{{u,,v) ^ G) < gl^'l 

V^'gy v'<^V' J \v'&V' J v'&V 

for all V C V{G) such that Pr{/\^,^y, v' ^ M) > 0. Furthermore, notice that in every 
scenario M leaves at least i nodes from V unmatched, since it can match at most k — i nodes 
of U. Then inequality ([7]) reduces to the desired bound Pr{ui ^ M(S', G)) < g*. 

Therefore the probability that Ui is matched in M{S, G) is at least 1 — g*. Since G is 
bipartite, EM(S', G) is equal to the expected number of nodes of U which are matched, so 
by linearity of expectation EM(S', G) > Yli=ii^ — q") = k — Yl!i=i But then for any e > 
we can bound the last summation as follows: 

k \e.k\ k 

i=l i=l i=lek\ 

which gives the desired bound of EM(5, G) > (1 - gL^^J - e)k. 



C Proofs for Section 5 
C.l Proof of Lemma 10 

Consider the graph G depictured in Figure HI It consists of n/4 disjoint paths and every edge 
has probability 1, except edge ef which has probability 1/2. Consider the following adaptive 
strategy S: it first queries ef and, in case it belongs to the realization of G, S queries edges 
{e^} sequentially in an arbitrary order; in case e does not belong to the realization, S queries 
edges {e}} and {ef } sequentially in an arbitrary order. Notice that in the first case S obtains 
a matching of size n/4, whereas in the second it obtains a matching of size n/2. 

e\ e\ e\ 
o — -o — -o — ^-o 



12 3 
62 63 62 

o o o o 



en en e| 



Figure 4: Graph for proof of Lemma 10. 

Recall that (Xj ~ G and that r/ = (1/A;) ^(^i W{S, G){a.-^\. From the previous paragraph 
we know that \M[S^G)[ai)\ = n/2 + {n/2)B{l, 1/2), where B{a,b) denotes a binomial ran- 
dom variable with a trials and success probability b. Therefore, rj = n/2 + {n/2k)B{k, 1/2). 
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Moreover, since EM{S, G) = E [77] = 3n/4, we get that 

Pr (r^ > EM(5, G)+t) = Pr + {k, 1/2) >^ + t^=Pr(^B {k, 1/2) > ^ + . 

However, for every t' G [0, A;/8] we can lower bound Pr (5 (A;, 1/2) > |+f) by 3^ exp(-16t'VA;) 
|18j . By our hypothesis on t we have 2kt/n < k/8 and hence we can employ this bound on 
the last displayed inequality, obtaining: 

Pr {7] > EM(5, G)+t)>— exp 

15 

Since 77 is symmetric around its expected value EM(5', G), the same upper bound holds 
for its lower tail: 

Pr (t] < EM(S, G)-t)>— exp 

15 

The lemma then follows by combining the bounds on upper and lower the tails. 
C.2 Proof of Lemma 12 

Fix j and define the function gj (xi, . . . , Xj^i, Xj+i, . . . , Xm) as the size of the maximum match- 
ing in the subgraph of G which contains edge Cj iff Xj = 1; we remark that this subgraph 
does not contain the edge Cj. It is easy to see that for every xi, . . . , G {0, 1} 

< /i'(Xi, . . . , Xm) ~ 9j{,Xij . . . , Xj-i, Xj+i, . . . , Xm) < 1- 

Now let M{xi, . . . ,Xm) be a maximum matching in the subgraph of G induced by 
Xi, . . . ,Xm and recall that |M(xi, . . . ,Xm) \ = fJ''{xi, . . . ,Xm)- Notice that if /i'(xi, . . . ,Xm) > 
gj{xi, . . . , Xj_i, Xj+i, . . . , Xm) then it must be the case that ej belongs to M(xi, . . . , Xm)- 
Then we can charge the difference between fi' and the g/s to the edges in M{xi, . . . , Xm)'- 

m 

; Xm) 9j 

(Xi, . . . ,Xj_i,Xj+i, . . .,Xm)] < |M(X1, . . .,Xm)\ = f^'^Xi, . . .,Xm) 

which shows that fi'{.) is self-bounding. 





D Concentration inequalities 

For completeness we present two inequalities used to bound large deviations of sums of 
random variables [1]. 

Lemma 13 (Hoeffding's inequality). Let Xi, . . . ,Xk be independent random variables with 
X, G Kk]. Let Y = (l/k) Eti^^- Then 



Pr{\Y -E[Y]\ >t)< 2exp 



2kH 



2+2 
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Lemma 14 (Bernstein's inequality). Let Xi, . . . ,Xk be independent random variables with 
equal variance Var{Xi] — cr^. Let Y — {1/k) Yli=i-^i- Then 

P.(|r-Elrl|>*)<2exp(-^^-^]. 
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